Overview of STEM Science as Process, Method, Material, and Data Named Entities

نویسندگان

چکیده

We are faced with an unprecedented production in scholarly publications worldwide. Stakeholders the digital libraries posit that document-based publishing paradigm has reached limits of adequacy. Instead, structured, machine-interpretable, fine-grained knowledge as Knowledge Graphs (KG) is strongly advocated. In this work, we develop and analyze a large-scale structured dataset STEM articles across 10 different disciplines, viz. Agriculture, Astronomy, Biology, Chemistry, Computer Science, Earth Engineering, Material Mathematics, Medicine. Our analysis defined over corpus comprising 60K abstracts four scientific entities process, method, material, data. Thus, our study presents, for first time, multidisciplinary under construct named entity labels specifically selected to be domain-independent opposed domain-specific. The work then inadvertently feasibility test characterizing science concepts. Further, summarize distinct facets per concept discipline, set word cloud visualizations offered. STEM-NER-60k corpus, created comprises 1 M extracted from 60k obtained major platform publicly released.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PAYMA: A Tagged Corpus of Persian Named Entities

The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...

متن کامل

Named Entities in Czech: Annotating Data and Developing NE Tagger

This paper deals with the treatment of Named Entities (NEs) in Czech. We introduce a two-level NE classification. We have used this classification for manual annotation of two thousand sentences, gaining more than 11,000 NE instances. Employing the annotated data and Machine-Learning techniques (namely the top-down induction of decision trees), we have developed and evaluated a software system ...

متن کامل

Annotation of Chemical Named Entities

We describe the annotation of chemical named entities in scientific text. A set of annotation guidelines defines 5 types of named entities, and provides instructions for the resolution of special cases. A corpus of fulltext chemistry papers was annotated, with an inter-annotator agreement score of 93%. An investigation of named entity recognition using LingPipe suggests that scores of 63% are p...

متن کامل

Separating Named Entities

In this paper, we analyze the situation of long sequences of mostly capitalized words which look like a named entity but in fact they consist of several named entities. An example of such phenomena is hokejista (hockey player) New York Rangers Jaromír Jágr. Without splitting the sequence correctly, we will wrongly assume that the whole capitalized sequence is a name of the hockey player. To fin...

متن کامل

AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data

Over the last decades, several billion Web pages have been made available on the Web. The ongoing transition from the current Web of unstructured data to the Web of Data yet requires scalable and accurate approaches for the extraction of structured data in RDF (Resource Description Framework) from these websites. One of the key steps towards extracting RDF from text is the disambiguation of nam...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Knowledge

سال: 2022

ISSN: ['2809-4042', '2809-4034']

DOI: https://doi.org/10.3390/knowledge2040042